Deterministic Parsing of Syntactic Non-fluencies
نویسنده
چکیده
It is often remarked that natural language, used naturally, is unnaturally ungrammatical.* Spontaneous speech contains all manner of false starts, hesitations, and self-corrections that disrupt the well-formedness of strings. It is a mystery then, that despite this apparent wide deviation from grammatical norms, people have little difficx:lty understanding the non-fluent speech that is the essential medium of everyday life. And it is a still greater mystery that children can succeed in acquiring the grammar of a language on the basis of evidence provided by a mixed set of apparently grammatical and ungrammatical strings. In this paper I present a system of rules for resolving the non-fluencies of speech, implemented as part of a computational model of syntactic processing. The essential idea is that non-fluencies occur when a speaker corrects something that he or she has already said out loud. Since words once said cannot be unsaid, a speaker can only accomplish a self-correction by saying something additional-namely the intended words. The intended words are supposed to substitute for the wrongly produced words. For example, in sentence (1), the speaker initially said I but meant we. (1) I was-we were hungry. The problem for the hearer, as for any natural language understanding system, is to determine what words are to be expunged from the actual words said to find the intended sentence. Labov (1966) provided the key to solving this problem when he noted that a phonetic signal (specifically, a markedly abrupt cutoff of the speech signal) always marks the site where self-correction takes place. Of course, finding the site of a self-correction is only half the problem; it remains to specify what should be removed. A first guess suggests that this must be a non-deterministic problem, requiring complex reasoning about what the speaker meant to say. Labov claimed that a simple set of rules operating on the surface string would specify exactly what should be changed, transforming nearly all non-fluent strings into fully grammatical sentences. The specific set of transformational rules Labor proposed were not formally adequate, in part because they were surface transformations which ignored syntactic constituenthood. But his work forms the basis of this current analysis. Labor's claim was not of course that ungrammatical sentences are never produced in speech, for that clearly would be false. Rather, it seems that truly ungrammatical productions represent only a tiny fraction of the spoken output, and in the preponderance of cases, …
منابع مشابه
An improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملبرچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کاملPseudo-Projective Dependency Parsing
In order to realize the full potential of dependency-based syntactic parsing, it is desirable to allow non-projective dependency structures. We show how a datadriven deterministic dependency parser, in itself restricted to projective structures, can be combined with graph transformation techniques to produce non-projective structures. Experiments using data from the Prague Dependency Treebank s...
متن کاملبررسی مقایسهای تأثیر برچسبزنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی
In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...
متن کاملMultimodal Interactive Parsing
Probabilistic parsing is a fundamental problem in Computational Linguistics, whose goal is obtaining a syntactic structure associated to a sentence according to a probabilistic grammatical model. Recently, an interactive framework for probabilistic parsing has been introduced, in which the user and the system cooperate to generate error-free parse trees. In an early prototype developed accordin...
متن کامل